Exploiting Concept Clumping for Efficient Incremental E-Mail Categorization
نویسندگان
چکیده
We introduce a novel approach to incremental e-mail categorization based on identifying and exploiting “clumps” of messages that are classified similarly. Clumping reflects the local coherence of a classification scheme and is particularly important in a setting where the classification scheme is dynamically changing, such as in e-mail categorization. We propose a number of metrics to quantify the degree of clumping in a series of messages. We then present a number of fast, incremental methods to categorize messages and compare the performance of these methods with measures of the clumping in the datasets to show how clumping is being exploited by these methods. The methods are tested on 7 large real-world e-mail datasets of 7 users from the Enron corpus, where each message is classified into one folder. We show that our methods perform well and provide accuracy comparable to several common machine learning algorithms, but with much greater computational efficiency.
منابع مشابه
Exploiting Concept Clumping for Efficient Incremental News Article Categorization
In this paper, we introduce efficient methods for incremental multilabel categorization of documents. We use concept clumping to efficiently categorize news articles into a hierarchical structure of categories. Concept clumping is a phenomenon of local coherences occurring in the data and it has been previously used for fast, incremental e-mail classification. We extend the definition of clumpi...
متن کاملTRESTLE: Incremental Learning in Structured Domains using Partial Matching and Categorization
We present TRESTLE, an incremental algorithm for probabilistic concept formation in structured domains that builds on prior concept learning research. TRESTLE works by creating a hierarchical categorization tree that can be used to predict missing attribute values and cluster sets of examples into conceptually meaningful groups. It is able to update its knowledge by partially matching novel str...
متن کاملScalable packet classification with controlled cross-producting
1389-1286/$ see front matter 2008 Elsevier B.V doi:10.1016/j.comnet.2008.11.017 * Tel.: +886 4 22840497x710. E-mail address: [email protected] 1 This work is supported in part by the National S Grant No. NSC 97-2221-E-005-049. Packet classification is central among traffic classification techniques that categorize packets with a traffic descriptor or with user-defined criteria. This categor...
متن کاملPreyssler Heteropoly Acid: An Efficient Catalyst for One-Pot Synthesis of Bis(dihydropyrimidinone)benzenes
متن کامل
A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System
In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...
متن کامل